Antipattern: Create Multiple Columns

Let's create multiple columns for multi-column attributes.

We still have to account for multiple values in the attribute, but we know that the new solution must store only a single value in each column. It may seem natural to create multiple columns in this table, each containing a single tag.

Creating Bugs table

As we assign tags to a given bug, we’d put values in one of these three columns. Unused columns would remain null.

Updating Bugs table

Let’s try to retrieve the Bugs table data after updating the value of tag2. Press “RUN” in the following playground and see the output.

Retrieving data after updating Bugs table

Tasks we could do easily with a normal attribute are now more complex.

Searching for values#

When searching for bugs with a given tag, we must search all three columns because the tag string could occupy any of these columns.

If we’re searching for a given value across multiple columns, it is a clue that the multiple columns should be stored as a single logical attribute.

For example, to retrieve bugs that reference performance, we can use a query like this:

Retrieving bugs having performance tag

We may need to search for bugs that reference both tags, i.e., performance and printing. To do this, we use a query like the following one. It is important to use the parentheses correctly because OR has lower precedence than AND.

Retrieving bugs having performance and printing tag values

The syntax required to search for a single value over multiple columns is lengthy and tedious to write. We can make it more compact by using an IN predicate in a slightly untraditional manner:

Retrieving bugs with tags using the IN keyword for performance and printing

Adding and removing values#

Adding or removing a value from a set of columns presents its own issues. Simply using UPDATE to change one of the columns isn’t safe since we can’t be sure which column is unoccupied, if any. We may have to retrieve the row into our application to see.

Retrieving data before updating something

In this case, for instance, the result shows that tag2 is null. Now, we can form the UPDATE statement.

Updating a tag for bug_id 3456

Let’s retrieve the results after writing the query in the following playground.

Retrieving data after updating the tag

We face the risk of another client going through the same steps of reading the row and updating it before we have updated the table after quering it. Depending on who applied their update first, either we or the other client risk getting an update conflict error or having their changes overwritten by the other. We can avoid this two-step query by using complex SQL expressions.

The following statement uses the NULLIF() function to make each column null if it equals a specific value. NULLIF() returns null if its two arguments are equal.

Updating Bugs table to set NULL in place of the specific value using NULLIF function

Let’s retrieve the data after updating the table.

Retrieving the records after updating the Bugs table

The following statement adds the new tag performance to the first column that is currently null. However, if none of the three columns is null, the statement makes no change to the row, and the new tag value is not recorded. Constructing this statement is laborious. Notice also that we must repeat the string performance six times.

Updating Bugs table by adding a value in the first attribute having no value

Let’s retrieve the data after using UPDATE in a query.

Retrieving Bugs table after updating the records

Ensuring uniqueness#

We probably don’t want the same value to appear in multiple columns, but the database can’t prevent this when we use the Multi-column Attributes antipattern. In other words, it’s hard to prevent the following statement:

Inserting a new record in the Bugs table

Let’s retrieve the data in the following playground to see the results after using the INSERT statement.

Retrieving the Bugs table data after inserting a new record

Handling growing sets of values#

Another weakness of this design is that three columns might not be enough. To keep the design of one value per column, we must define as many columns as the maximum number of tags a bug can have. How can we predict, at the time we define the table, what that greatest number will be?

One tactic is to guess at a moderate number of columns and expand later, if necessary, by adding more columns. Most databases allow us to restructure existing tables to add Bugs.tag4 or even more columns as we need them.

Adding a new column in the Bugs table

However, this change is costly in three ways:

  • Restructuring a database table that already contains data may require locking the entire table and blocking access for other concurrent clients.

  • Some databases implement this kind of table restructure by defining a new table to match the desired structure, copying the data from the old table, and then dropping the old table. If the table in question has a lot of data, this transfer can take a long time.

  • When we add a column in the set for a multi-column attribute, we must revisit every SQL statement in every application that uses this table, editing the statement to support new columns.

Retrieving the records which contain value performance in any of its attribute

This is a meticulous and time-consuming development task. If we miss any queries that need edits, it can lead to bugs that are difficult to detect.

Synopsis: Multi-column Attributes
Solution: Create Dependent Table
Mark as Completed
Report an Issue